Public Interest::Data Ethics &
Practice
Who are we wrt R?
Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…
R is the computational engine; RStudio is the interface
For any new project in R, create an R project. Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory. If you never have to set the working directory at the top of the script, that’s a good thing!1
And create a system for organizing the objects in this project!
Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.
To use a package, install it once
tidyverse (or a different package name) then
click on Install.install.packages("tidyverse")In each new R session, you’ll have to load the package if you want
access to its functions: e.g., type library(tidyverse).
# demarcates code comments<- is the assignment operator, how we name new
objects in the R environmentYou can import pretty much any data format into R if you know the
right command and (package):
read.csv (base R), read_csv
(tidyverse)read_excel (readxl)read.dta (foreign),
read_dta (haven)Primary data types include numeric, integer, logical, and character; plus factors.
Examining data: * names() * head() and
tail() * str() and glimpse() *
summary()
Part of the the tidyverse, dplyr is a package
for data manipulation. The package implements a grammar for
transforming data, based on verbs/functions that define a set of common
tasks.
dplyr functions are for data frames.
dplyr functions is always a data
frameselect() helpers include
| Logical tests | Boolean operators for multiple conditions |
|---|---|
| x < y: less than | a & b: and |
| y >= y: greater than or equal to | a | b: or |
| x == y: equal to | xor(a,b): exactly or |
| x != y: not equal to | !a: not |
| x %in% y: is a member of | |
| is.na(x): is NA | |
| !is.na(x): is not NA |
The pipe (%>%) allows you to chain together functions
by passing (piping) the result on the left into the first argument of
the function on the right. It allows us to call a series of functions in
sequence (read the pipe as “and then…”).
dataframe %>%
filter(var1 > 0) %>%
select(var1, var2, var3)
%>%Click to download a zipped file. Store the unzipped folder on your computer where you can find it. It contains
Artwork by @allison_horst
Especially since no one seems to understand paths and directories any more.↩︎